AITopics | optimal stopping

Motivated by stochastic optimization, we introduce the problem of learning from samples of contextual value distributions. A contextual value distribution can be understood as a family of real-valued distributions, where each sample consists of a context $x$ and a random variable drawn from the corresponding real-valued distribution $D_x$. By minimizing a convex surrogate loss, we learn an empirical distribution $D'_x$ for each context, ensuring a small Lévy distance to $D_x$. We apply this result to obtain the sample complexity bounds for the learning of an $ε$-optimal policy for stochastic optimization problems defined on an unknown contextual value distribution. The sample complexity is shown to be polynomial for the general case of strongly monotone and stable optimization problems, including Single-item Revenue Maximization, Pandora's Box and Optimal Stopping.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2505.16829

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions

Kuchko, Anna

arXiv.org Machine LearningMar-5-2025

We consider Inverse Optimal Stopping (IOS) problem where, based on stopped expert trajectories, one aims to recover the optimal stopping region through continuation and stopping gain functions approximation. The uniqueness of the stopping region allows the use of IOS in real-world applications with safety concerns. While current state-of-the-art inverse reinforcement learning methods recover both a Q-function and the corresponding optimal policy, they fail to account for specific challenges posed by optimal stopping problems. These include data sparsity near the stopping region, non-Markovian nature of the continuation gain, a proper treatment of boundary conditions, the need for a stable offline approach for risk-sensitive applications, and a lack of a quality evaluation metric. These challenges are addressed with the proposed Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping (DO-IQS), which incorporates temporal information by approximating the cumulative continuation gain together with the world dynamics and the Q-function without querying to the environment. Moreover, a confidence-based oversampling approach is proposed to treat the data sparsity problem. We demonstrate the performance of our models on real and artificial data including an optimal intervention for critical events problem.

algorithm, optimal, preprint, (12 more...)

arXiv.org Machine Learning

2503.03515

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

Moustakides, George V.

arXiv.org Artificial IntelligenceJul-18-2024

When the underlying conditional density is known, conditional expectations can be computed analytically or numerically. When, however, such knowledge is not available and instead we are given a collection of training data, the goal of this work is to propose simple and purely data-driven means for estimating directly the desired conditional expectation. Because conditional expectations appear in the description of a number of stochastic optimization problems with the corresponding optimal solution satisfying a system of nonlinear equations, we extend our data-driven method to cover such cases as well. We test our methodology by applying it to Optimal Stopping and Optimal Action Policy in Reinforcement Learning.

conditional expectation, linewidth, sqrt, (12 more...)

arXiv.org Artificial Intelligence

2407.13189

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Approximation of Convex Envelope Using Reinforcement Learning

Borkar, Vivek S., Akarsh, Adit

arXiv.org Artificial IntelligenceNov-24-2023

Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for controlled optimal stopping. It shows very promising results on a standard library of test problems.

convex envelope, equation, obtained convex envelope, (11 more...)

arXiv.org Artificial Intelligence

2311.14421

Country:

Asia > India > Maharashtra > Mumbai (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimal Stopping and Effective Machine Complexity in Learning

Neural Information Processing SystemsApr-6-2023, 18:51:21 GMT

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert.

learning, optimal stopping, stopping and effective machine complexity

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Stopping with Gaussian Processes

Dwarakanath, Kshama, Dervovic, Danial, Tavallali, Peyman, Vyetrenko, Svitlana S, Balch, Tucker

arXiv.org Artificial IntelligenceOct-7-2022

Functional data analysis has long been used in modeling time series enabling long term predictions with the ability to We propose a novel group of Gaussian Process based algorithms work with irregularly sampled data [7]. In time series modeling, for fast approximate optimal stopping of time series with specific approaches based on Gaussian Processes (GPs) allow long term applications to financial markets. We show that structural properties forecasting in settings with small quantities of data for calibration commonly exhibited by financial time series (e.g., the tendency and those with a need to estimate the covariance of predictions [30, to mean-revert) allow the use of Gaussian and Deep Gaussian Process 17]. GPs also come up in finance when studying mean reverting models that further enable us to analytically evaluate optimal processes called Ornstein-Uhlenbeck (OU) processes which are GPs stopping value functions and policies. We additionally quantify with an exponential kernel [29].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2209.14738

Country:

North America > United States > New York > New York County > New York City (0.15)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Trading (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Deep Reinforcement Learning for Optimal Stopping with Application in Financial Engineering

Fathan, Abderrahim, Delage, Erick

arXiv.org Artificial IntelligenceMay-18-2021

Optimal stopping is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn optimal stopping policies in two financial engineering applications: namely option pricing, and optimal option exercise. We present for the first time a comprehensive empirical evaluation of the quality of optimal stopping policies identified by three state of the art deep RL algorithms: double deep Q-learning (DDQN), categorical distributional RL (C51), and Implicit Quantile Networks (IQN). In the case of option pricing, our findings indicate that in a theoretical Black-Schole environment, IQN successfully identifies nearly optimal prices. On the other hand, it is slightly outperformed by C51 when confronted to real stock data movements in a put option exercise problem that involves assets from the S&P500 index. More importantly, the C51 algorithm is able to identify an optimal stopping policy that achieves 8% more out-of-sample returns than the best of four natural benchmark policies. We conclude with a discussion of our findings which should pave the way for relevant future research.

algorithm, optimal, optimal stopping, (14 more...)

arXiv.org Artificial Intelligence

2105.08877

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Optimal Stopping via Randomized Neural Networks

Herrera, Calypso, Krach, Florian, Ruyssen, Pierre, Teichmann, Josef

arXiv.org Machine LearningApr-28-2021

This paper presents new machine learning approaches to approximate the solution of optimal stopping problems. The key idea of these methods is to use neural networks, where the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable for high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using a simple linear regression, they are very easy to implement and theoretical guarantees can be provided. In Markovian examples our randomized reinforcement learning approach and in non-Markovian examples our randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches.

algorithm, neural network, tsitsiklis and van roy, (12 more...)

arXiv.org Machine Learning

2104.13669

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Vietnam > Long An Province > Tân An (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Optimal Stopping and Effective Machine Complexity in Learning

Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen

Neural Information Processing SystemsDec-31-1994

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.

effective size, generalization error, stopping and effective machine complexity, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimal Stopping and Effective Machine Complexity in Learning

Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen

Neural Information Processing SystemsDec-31-1994

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.

effective size, generalization error, stopping and effective machine complexity, (11 more...)

Neural Information Processing Systems

Country: